Natural Discourse Hypothesis Engine
نویسنده
چکیده
As text generation systems get more sophisticated and capable of producing a wider syntactic and lexical range, the issue of how to choose among available grammatical options (preferably in a human-like way) becomes more pressing. Thus, devisers of text generation systems are frequently called upon to provide their own analyses of the discourse function of various kinds of alternations. In this paper, I describe a proposed research tool which is designed to help a researcher explore and analyze a natural-language "target text" in order to determine the contextual factors that predict the choice of one or another lexical item or grammatical feature. The project described in this paper is still at the drawing-board stage; I welcome suggestions about ways it could be changed or expanded to fulfill particular analytic needs. Theoretical p r e l i m i n a r i e s While some aspects of a natural-language text are determined by the nature of the information a speaker wishes to convey to a hearer, there are many more aspects that seem to be determined by certain cognitive needs that the hearer has. Speakers tailor their output to the informational resources and deficiencies of the hearer in several ways: by adjusting the amount of information they make explicit, by arranging new information relative to old information in a maximally helpful way, and by giving special marking to information that may be difficult for the hearer to access for a variety of reasons. It is these strategies that give rise to the wide variety of syntactic and lexical resources of any natural language for saying the "same thing" in different ways. We can call the relation between lexicogrammatical features and the speaker's communicative goals in choosing those features the "discourse functions" of the features. For any particular alternation, then, the best predictor of the speaker's choice should be a model of the cognitive state of the hearer. Unfortunately, neither human speakers nor computer systems have direct access to the hearer's mind. But linguists have long realized that we do have access to a fair approximation of an important subset of the information the hearer possesses at a given point in a discourse: namely the text which has been produced up to that point. ([Chafe 1987] and [Giv6n 1983] are two contemporary expressions of that principle.) And in fact we can make fairly good predictions of lexico-grammatical choices based on inferences that come from the nature of the preceding text. For instance, a referent that has been referred to in the previous clause is likely to receive minimal coding (a pronoun or zero, depending on syntactic considerations). But this principle can be overridden by the presence of other factors that interfere with the accessibility of the referent e.g. a paragraph break or another competing referent resulting in the use of a full noun phrase. Or, to give another example, a speaker is likely to use special syntax (such as the "presentative" or "there is..." construction) to introduce a referent that will be prominent in the following discourse; so a hearer is likely to have easier access to a referent introduced in that way than one that has been introduced "casually", e.g. in a prepositional phrase. Therefore, subsequent references to a referent that has been introduced with a presentative are more likely to be pronouns than noun phrases. These are all factors that can be discerned in the preceding text and are taken into account by speakers as affecting the nature of the bearer's expectations. Therefore under these perturbing circumstances the speaker can decide to use a fuller reference, e.g. a proper name or noun phrase. Figure 1 illustrates the relation between the discourse produced by the speaker, the hearer's mental state, and the speaker's model of the hearer. In real face-to-face interaction, the hearer can influence the speaker's model of her or him in more direct ways e.g. by verbal and nonverbal indications of agreement, understanding, protest, or confusion. But this two-way channel is available neither to human writers nor to computer text generators.
منابع مشابه
Semantic and Discourse Processing with a Feature Structure-based Production System
This paper presents a production (rulebased) system designed to implement a general purpose discourse processing engine for spoken dialogue systems, and tries to show that the engine can carry out discourse and semantic analyses of various phenomena in spoken language in an integrated manner. The function of the engine is to give semantic/discourse analysis to parsed utterances and to determine...
متن کاملA Decision-Theoretic Architecture for Selecting Tutorial Discourse Actions
We propose a decision-theoretic architecture for selecting tutorial discourse actions. DT Tutor, an action selection engine which embodies our approach, uses a dynamic decision network to consider the tutors objectives and uncertain beliefs in adapting to the changing tutorial state. It predicts the effects of the tutors discourse actions on the tutorial state, including the students interna...
متن کاملA Language-Independent Anaphora Resolution System for Understanding Multilingual Texts
This paper describes a new discourse module within our multilingual NLP system. Because of its unique data-driven architecture, the discourse module is language-independent. Moreover, the use of hierarchically organized multiple knowledge sources makes the module robust and trainable using discourse-tagged corpora. Separating discourse phenomena from knowledge sources makes the discourse module...
متن کاملExperimental Study of Performance of Spark Ignition Engine with Gasoline and Natural Gas
The tests were carried out with the spark timing adjusted to the maximum brake torquetiming in various equivalence ratios and engine speeds for gasoline and natural gas operations. In thiswork, the lower heating value of gasoline is about 13.6% higher than that of natural gas. Based on theexperimental results, the natural gas operation causes an increase of about 6.2% brake special fuelconsumpt...
متن کاملAutomatic Resolution of Ambiguous Abbreviations in Biomedical Texts using Support Vector Machines and One Sense Per Discourse Hypothesis
We present an algorithm to disambiguate abbreviations in Medline abstracts using Support Vector Machines (SVM) and one sense per discourse hypothesis. In contrast to other work using SVM for natural language disambiguation which always depend on handcrafted training and testing data, the algorithm provided here automatically extracts the training and testing data through searching long form of ...
متن کاملA Comprehensive Comparative Investigation of Compressed Natural Gas as an Alternative Fuel in a Bi-Fuel Spark Ignition Engine
Nowadays, increased attention has been focused on internal combustion engine fuels. Regarding environmental effects of internal combustion engines particularly as sources of pollution and depletion of fossil fuels, compressed natural gas has been introduced as an alternative to gasoline and diesel fuels in many applications. A high research octane number which allows combustion at higher co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1990